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Statement  of  the  problem  studied 


Sparsity-based  methods  have  recently  been  suggested  for  tasks  such  as  face  and  iris  recognition. 
In  this  project,  we  evaluated  the  effectiveness  of  such  methods  for  automatic  target  recognition 
in  infrared  images.  We  show  how  sparsity  can  be  helpful  for  efficient  utilization  of  data  for 
target  recognition.  We  evaluated  the  effectiveness  of  the  proposed  algorithm  in  terms  of 
recognition  rate  and  confusion  matrices  on  the  well  known  Comanche  forward-looking  infrared 
(FLIR)  data  set  consisting  of  ten  different  military  targets  at  different  orientations.  This  work 
was  done  in  collaboration  with  Dr.  Nasser  Nasrabadi,  Chief  Scientist,  SEDD,  Army  research 
laboratory.  This  work  will  be  presented  at  the  International  Conference  on  Image  Processing 
being  held  in  Hong  Kong  in  September  2010.  A  journal  paper  reporting  our  work  is  under 
preparation. 


Summary  of  the  most  important  results 

The  objective  of  an  Automatic  Target  Recognition  (ATR)  algorithm  is  to  detect  and  identify  each 
target  image  into  one  of  a  number  of  classes.  The  recognition  algorithm  may  consist  of  several 
stages.  For  example,  in  the  first  stage  a  target  is  detected  on  the  entire  image;  in  the  second  stage, 
background  clutter  is  removed;  in  the  third  stage,  a  set  of  features  are  computed  and  finally,  in 
the  fourth  stage,  classification  is  done  by  means  of  a  classifier.  In  this  paper,  we  mainly  focus  on 
the  last  two  stages. 

Target  recognition  using  forward-looking  infrared  (FLIR)  imagery  of  different  targets  in  natural 
scenes  is  difficult  due  to  high  variation  in  the  thennal  signatures  of  targets.  Many  ATR 
algorithms  have  been  proposed  for  FLIR  imagery.  Wang  et  al.  proposed  a  modular  neural 
network-based  ATR  algorithm  in  [1].  In  their  algorithm,  several  neural  networks  are  trained, 
each  optimized  for  a  local  region  in  the  image,  whose  classification  decisions  are  combined  to 
determine  the  final  classification.  Wavelet-based  vector  quantization  was  used  for  FLIR  ATR  in 
[2]  by  Chan  and  Nasrabadi,  where  a  discriminative  dictionary  was  created  in  the  wavelet  domain 
using  learning  vector  quantization.  A  recognition  method  based  on  hidden  Markov  tree  that  uses 
a  Karhunen-Loeve  representation  was  proposed  by  Bharadwaj  and  Carin  in  [3].  See  [4]  for  an 
excellent  survey  of  papers  and  experimental  evaluation  of  FLIR  ATR.  The  algorithms  evaluated 
in  [4]  include  convolutional  neural  network  (CNN),  principal  component  analysis  (PC A),  linear 
discriminant  analysis  (LDA),  learning  vector  quantization  (LVQ),  modular  neural  networks 
(MNN),  and  two  model-based  algorithms,  using  Hausdorff  metric-based  matching  (H-M)  and 
geometric  hashing  (G-H). 

FLIR  images  often  contain  unwanted  thermal  signatures  of  the  background  clutter  whose 
characteristics  changes  with  environment  much  as  change  in  fog,  rain  and  heat  which  can  make 
target  detection  and  recognition  difficult  for  automated  as  well  as  human  observers.  Recently, 
Wright  et  al.  [5]  introduced  a  sparse  representation-based  classification  (SRC)  algorithm  for  face 
recognition,  which  was  robust  to  varying  expression,  illumination,  occlusion  and  disguise  and  it 
outperformed  many  state  of  the  art  algorithms.  This  approach  is  based  on  the  theories  of 
Compressive  Sensing  (CS)  and  Sparse  Representation  (SR).  The  idea  is  to  create  a  dictionary 
matrix  of  the  training  samples  as  column  vectors.  The  test  sample  is  also  represented  as  a  column 


vector.  Different  dimensionality  reduction  methods  are  used  to  reduce  the  dimension  of  both  the 
test  vector  and  the  vectors  in  the  dictionary.  One  such  approach  for  dimensionality  reduction  is 
random  projections  [5],  Random  projections,  using  a  generated  sensing  matrix,  are  taken  of  both 
the  dictionary  matrix  and  the  test  sample.  It  is  then  simply  a  matter  of  solving  an  £1 
minimization  problem  in  order  to  obtain  the  sparse  solution.  Once  the  sparse  solution  is  obtained, 
it  can  provide  information  as  to  which  training  samples  the  test  vector  most  closely  relates  to. 
Furthermore,  it  was  shown  that,  if  the  sparsity  of  the  solution  is  properly  harnessed,  the  choice  of 
features  (e.g.  dimensionality  reduction  method)  is  no  longer  critical.  The  number  of  features  for  a 
given  class  and  the  sparse  solution  become  critical. 

Motivated  by  the  SRC  algorithm,  in  this  effort,  we  extended  the  use  of  SR  and  CS  for  the 
recognition  of  FLIR  target  images.  In  particular,  we  exploited  the  inherent  block  structure  of  the 
sparse  solution  induced  by  £1 -minimization.  Furthermore,  our  method  utilizes  a  redundant 
dictionary  that  includes  training  data  at  various  azimuth  angles,  hence  achieving  orientation 
invariance.  As  a  result,  our  algorithm  has  the  ability  to  identify  targets  at  different  orientations. 
The  details  of  the  algorithm  are  presented  in  [6,  7] 

Experiments 

We  evaluated  our  method  on  the  Comanche  FLIR  data  set  consisting  of  different  military  targets 
at  different  orientations.  The  images  are  of  size  40><75  pixels.  In  all  of  our  experiments,  the 
dimension  of  each  target  image  (chip)  was  reduced  from  40x75  to  16x16.  There  have  been  a 
number  of  approaches  suggested  for  solving  block  sparsity  (BS)  promoting  optimization  problem 
(15).  In  our  approach,  we  employed  a  highly  efficient  algorithm  that  is  suitable  for  large  scale 
applications  known  as  the  spectral  projected  gradient  (SPGL1)  algorithm  [8].  The  perfonnance 
of  our  algorithm  is  compared  with  that  of  several  different  methods  reported  in  [1],  [2],  [4].  Our 
algorithm  is  also  tested  using  several  features,  namely  PCA  features,  random  projection  (RP) 
features,  2D  Haar  wavelet  features,  and  downsampled  images. 

In  our  data  set,  there  are  10  different  vehicle  targets.  We  will  denote  these  targets  as  TGI,  TG2, 
•  •  •  ,  TG10.  For  each  target,  there  are  72  orientations,  corresponding  to  the  aspect  angles  of 

0°,  5°,  •  •  •  ,  355  degrees  in  azimuth.  The  range  to  all  the  targets  is  given  so  that  all  the 

target  chips  are  analyzed  at  2  kilometers.  The  data  consists  of  a  training  set  and  a  test  set.  We 
will  refer  to  the  training  set  as  the  SIG  set  and  test  set  as  the  ROI  set.  The  SIG  data  set  has  about 
13,816  target  chips,  while  there  are  3,353  images  in  the  ROI  data  set.  The  SIG  data  set  consists 
of  the  images  that  were  collected  under  very  favorable  conditions.  The  SIG  data  set  contains  874 
to  1468  images  per  target  class  spanned  over  72  different  aspects. 

The  ROI  set  consists  of  only  five  targets  namely  TGI,  TG2,  TG3,  TG4  and  TG7.  The  target 
images  for  the  ROI  set  were  taken  under  less  favorable  conditions,  such  as  targets  with  different 
weather  conditions,  in  different  background,  in  and  around  clutter;  hence,  this  data  is  very 
challenging.  There  are  577  to  798  images  for  each  of  these  five  target  classes.  All  the  images  in 
the  SIG  and  ROI  sets  were  nonnalized  to  a  fixed  range  with  the  target  put  approximately  in  the 
center.  The  orientation  in  the  ROI  set  was  given  very  coarsely;  every  45degrees. 


In  the  first  set  of  experiments,  the  training  and  test  images  were  chosen  from  the  SIG  data  set. 


For  training,  we  randomly  choose  11  target  chips  for  each  target  per  aspect  angle,  called 
TRAINSIG.  Since  we  have  a  total  of  72  aspects  (e.g.  0°,  5°,  •  •  •  ,  355  degrees)  for  each 

target,  we  used  a  total  of  11^72  =  792  targets  per  class.  The  probabilities  of  correct  classification 
for  these  experiments  are  98.48%,  99.18%,  99.96%  and  99.95%  for  the  downsampled,  RP,  PCA 
and  Haar  wavelet  features,  respectively.  All  the  features  performed  approximately  the  same  for 
these  experiments. 

In  the  second  set  of  experiments,  we  again  randomly  selected  1 1  targets  per  aspect  angle  from 
the  SIG  data  set  for  training.  Again,  the  resulting  dictionary  A  is  of  size  256  x  7290.  We 
randomly  selected  1000  images  from  the  ROI  set  for  testing,  called  the  TEST-ROI  set.  We 
extracted  various  features  and  applied  our  BS-based  algorithm  on  these  features  as  was  done  for 
the  TRAINSIG  dataset.  The  probabilities  of  correct  classification  for  these  experiments  are 
75.10,  76.30,  78.89  and  76.45%  for  the  downsampled,  RP,  PCA  and  Haar  wavelet  features, 
respectively.  Again,  PCA  features  gave  the  best  perfonnance.  In  these  experiments,  the  TEST- 
ROI  set  contained  only  five  targets,  but  all  of  the  outputs  were  active. 

The  best  recognition  results  on  the  TEST-SIG  and  TEST-ROI  data  sets  were  obtained  by  using 
the  PCA  features.  Our  method  achieves  recognition  rates  of  99.96%  and  78.89%  on  TEST-SIG 
and  TEST-ROI,  respectively  and  it  outperforms  the  other  methods  such  as  CNN,  MNN,  PCA, 
LVQ,  LDA,  H-M  and  G-H  [1],  [2],  [4].  Also,  note  that  our  method  is  more  general  than  the 
competing  methods  presented  in  [1]  and  [2],  In  their  methods,  to  deal  with  the  background 
artifacts,  they  use  several  rectangular  windows  of  different  size  based  on  the  ground  truth 
silhouette  computer-aided  design  models.  As  a  result,  their  performance  significantly  depends  on 
the  choice  of  windows.  In  contrast,  the  method  presented  here  does  not  require  any  windowing 
or  prior  knowledge  about  the  size  of  the  targets. 

Summary  and  conclusions 

We  have  developed  a  framework  for  ATR  using  the  theory  of  sparse  representations  and 
compressive  sensing.  This  entails  solving  a  block-sparsity  promoting  optimization  problem  on 
various  features.  Various  experiments  on  the  Comanche  FLIR  data  set  have  shown  promising 
results.  Several  future  directions  of  inquiry  are  possible  considering  our  new  approach  to  ATR. 
For  instance,  instead  of  using  the  Cl  minimization  one  can  consider  greedy  pursuits  such  as 
orthogonal  matching  pursuit  and  CoSaMP  [9],  [10],  [11].  Greedy  pursuits  are  known  to  converge 
much  faster  than  the  optimization  based  methods  and  have  the  same  theoretical  guarantees  as 
some  of  the  optimization  based  methods.  Note  that  the  sparsity  motivated  methods  for  ATR 
presented  here  for  FLIR  images  can  be  easily  extended  to  the  other  ATR  problems  such  as  the 
one  based  on  synthetic  aperture  radar  imagery. 
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